Learning a Perceptron-Based Named Entity Chunker via Online Recognition Feedback

نویسندگان

  • Xavier Carreras
  • Lluís Màrquez i Villodre
  • Lluís Padró
چکیده

We present a novel approach for the problem of Named Entity Recognition and Classification (NERC), in the context of the CoNLL-2003 Shared Task. Our work is framed into the learning and inference paradigm for recognizing structures in Natural Language (Punyakanok and Roth, 2001; Carreras et al., 2002). We make use of several learned functions which, applied at local contexts, discriminatively select optimal partial structures. On the top of this local recognition, an inference layer explores the partial structures and builds the optimal global structure for the problem. For the NERC problem, the structures to be recognized are the named entity phrases (NE) of a sentence. First, we apply learning at word level to identify NE candidates by means of a Begin-Inside classification. Then, we make use of functions learned at phrase level —one for each NE category— to discriminate among competing NEs. We propose a simple online learning algorithm for training all the involved functions together. Each function is modeled as a voted perceptron (Freund and Schapire, 1999). The learning strategy works online at sentence level. When visiting a sentence, the functions being learned are first used to recognize the NE phrases, and then updated according to the correctness of their solution. We analyze the dependencies among the involved perceptrons and a global solution in order to design a global update rule based on the recognition of namedentities, which reflects to each individual perceptron its committed errors from a global perspective. The learning approach presented here is closely related to –and inspired by– some recent works in the area of NLP and Machine Learning. Collins (2002) adapted the perceptron learning algorithm to tagging tasks, via sentence-based global feedback. Crammer and Singer (2003) presented an online topic-ranking algorithm involving several perceptrons and ranking-based update rules for training them. 2 Named-Entity Phrase Chunking

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase recognition by filtering and ranking with perceptrons

We present a phrase recognition system based on perceptrons, and an online learning algorithm to train them together. The recognition strategy applies learning in two layers, first at word level, to filter words and form phrase candidates, second at phrase level, to rank phrases and select the optimal ones. We provide a global feedback rule which reflects the dependencies among perceptrons and ...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003